951 resultados para Word analogy


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recent advances in neural language models have contributed new methods for learning distributed vector representations of words (also called word embeddings). Two such methods are the continuous bag-of-words model and the skipgram model. These methods have been shown to produce embeddings that capture higher order relationships between words that are highly effective in natural language processing tasks involving the use of word similarity and word analogy. Despite these promising results, there has been little analysis of the use of these word embeddings for retrieval. Motivated by these observations, in this paper, we set out to determine how these word embeddings can be used within a retrieval model and what the benefit might be. To this aim, we use neural word embeddings within the well known translation language model for information retrieval. This language model captures implicit semantic relations between the words in queries and those in relevant documents, thus producing more accurate estimations of document relevance. The word embeddings used to estimate neural language models produce translations that differ from previous translation language model approaches; differences that deliver improvements in retrieval effectiveness. The models are robust to choices made in building word embeddings and, even more so, our results show that embeddings do not even need to be produced from the same corpus being used for retrieval.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação apresentada à Escola Superior de Educação de Lisboa para obtenção de grau de Mestre em Intervenção Precoce

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dans ce mémoire, nous examinons certaines propriétés des représentations distribuées de mots et nous proposons une technique pour élargir le vocabulaire des systèmes de traduction automatique neurale. En premier lieu, nous considérons un problème de résolution d'analogies bien connu et examinons l'effet de poids adaptés à la position, le choix de la fonction de combinaison et l'impact de l'apprentissage supervisé. Nous enchaînons en montrant que des représentations distribuées simples basées sur la traduction peuvent atteindre ou dépasser l'état de l'art sur le test de détection de synonymes TOEFL et sur le récent étalon-or SimLex-999. Finalament, motivé par d'impressionnants résultats obtenus avec des représentations distribuées issues de systèmes de traduction neurale à petit vocabulaire (30 000 mots), nous présentons une approche compatible à l'utilisation de cartes graphiques pour augmenter la taille du vocabulaire par plus d'un ordre de magnitude. Bien qu'originalement développée seulement pour obtenir les représentations distribuées, nous montrons que cette technique fonctionne plutôt bien sur des tâches de traduction, en particulier de l'anglais vers le français (WMT'14).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The English writing system is notoriously irregular in its orthography at the phonemic level. It was therefore proposed that focusing beginner-spellers’ attention on sound-letter relations at the sub-syllabic level might improve spelling performance. This hypothesis was tested in Experiments 1 and 2 using a ‘clue word’ paradigm to investigate the effect of analogy teaching intervention / non-intervention on the spelling performance of an experimental group and controls. The results overall showed the intervention to be effective in improving spelling, and this effect to be enduring. Experiment 3 demonstrated a greater application of analogy in spelling, when clue words, which participants used in analogy to spell test words, remained in view during testing. A series of regression analyses, with spelling entered as the criterion variable and age, analogy and phonological plausibility (PP) as predictors, showed both analogy and PP to be highly predictive of spelling. Experiment 4 showed that children could use analogy to improve their spelling, even without intervention, by comparing their performance in spelling words presented in analogous categories or in random lists. Consideration of children’s patterns of analogy use at different points of development showed three age groups to use similar patterns of analogy, but contrasting analogy patterns for spelling different words. This challenges stage theories of analogy use in literacy. Overall the most salient units used in analogy were the rime and, to a slightly lesser degree, the onset-vowel and vowel. Finally, Experiment 5 showed analogy and phonology to be fairly equally influential in spelling, but analogy to be more influential than phonology in reading. Five separate experiments therefore found analogy to be highly influential in spelling. Experiment 5 also considered the role of memory and attention in literacy attainment. The important implications of this research are that analogy, rather than purely phonics-based strategy, is instrumental in correct spelling in English.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

随着社会的发展,尤其是互联网的发展,很多语言每年都涌现出了不少新词汇。词语是每个语言最基本也是最重要的组成部分,因此分析这些新词汇的结构特点以及构词法是很有意义的。这篇文章分析了2014年出现在中文里的新词汇和它们的构词方式,论文的目的是为了更好地了解中文词汇的发展和特点。本文以《2014汉语新词语》中公布的2014年出现的新词汇作为语料进行分析,发现了以下两个主要特点:第一,合成法,派生法,缩略法是2014年产生的新词汇的主要构词方式;第二, 百分之七十二的新词汇是多音节词(包含三个或者三个以上音节),而百分之八十的是名词。这些特点说明中文词汇现阶段的特点和发展趋势,跟传统的中文词汇有不同之处。

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper demonstrates how Indigenous Studies is controlled in some Australian universities in ways that continue the marginalisation, denigration and exploitation of Indigenous peoples. Moreover, it shows how the engagement of white notions of “inclusion” can result in the maintenance of racism, systemic marginalisation, white race privilege and radicalised subjectivity. A case study will be utilised which draws from the experience of two Indigenous scholars who were invited to be part of a panel to review one Australian university’s plan and courses in Indigenous studies. The case study offers the opportunity to destabilise the relationships between oppression and privilege and the epistemology that maintains them. The paper argues for the need to examine exactly what is being offered when universities provide opportunities for “inclusion”.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose an unsupervised segmentation approach, named "n-gram mutual information", or NGMI, which is used to segment Chinese documents into n-character words or phrases, using language statistics drawn from the Chinese Wikipedia corpus. The approach alleviates the tremendous effort that is required in preparing and maintaining the manually segmented Chinese text for training purposes, and manually maintaining ever expanding lexicons. Previously, mutual information was used to achieve automated segmentation into 2-character words. The NGMI approach extends the approach to handle longer n-character words. Experiments with heterogeneous documents from the Chinese Wikipedia collection show good results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Review of 'Gatz', Elevator Repair Company / Brisbane Powerhouse, published in The Australian, 12 May 2009.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Thai written language is one of the languages that does not have word boundaries. In order to discover the meaning of the document, all texts must be separated into syllables, words, sentences, and paragraphs. This paper develops a novel method to segment the Thai text by combining a non-dictionary based technique with a dictionary-based technique. This method first applies the Thai language grammar rules to the text for identifying syllables. The hidden Markov model is then used for merging possible syllables into words. The identified words are verified with a lexical dictionary and a decision tree is employed to discover the words unidentified by the lexical dictionary. Documents used in the litigation process of Thai court proceedings have been used in experiments. The results which are segmented words, obtained by the proposed method outperform the results obtained by other existing methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reveals a journey of theatrical exploration. It is a journey of enquiry and investigation backed by a vigorous, direct and dense professional history of creative work.